This document records the construction of the Index. For convenience the main steps (Data treatment; Normalisation and Aggregation) have now been condensed down and simpliefied to a dedicated function:
Outlier treatment : Outlier treatment aims to adjust the distributions of highly skewed, or fat tailed indicators, including cases where there are outliers that are not characteristic of the rest of the distribution. This is done to improve the discriminatory power of the indicator in aggregation. For more on this, see here.
skew and kurtosis : If the absolute skew is greater than 2 AND the kurtosis is greater than 3.5, data treatment is applied (step 3 onwards), else leave the indicator as it is and move back to 1 for the next indicator.
Winsorisation : up to a maximum number of five points. Check after each Winsorised point whether skew and kurtosis fall back within the limits specified. If so, apply no further data treatment and move to the next indicator. If the maximum number of Winsorised points is reached and skew and kurtosis are still outside the thresholds, “undo” any Winsorised points and apply a log transformation.
#> ----------
#> Your data:
#> ----------
#> Input:
#> Units: 340 (GT0101, GT0102, GT0103, ...)
#> Indicators: 41 (S.A.1, S.A.3, S.A.4, ...)
#>
#> Structure:
#> Level 1 Indicator: 41 indicators (A.D.1, A.M.1, A.M.2, ...)
#> Level 2 Category: 12 groups (Desastres, Desplaz, Violencia, ...)
#> Level 3 Dimension: 3 groups (Amenazas, Cap_Resp, Sit_SocEc)
#> Level 4 Index: 1 groups (MVI)
A number of indicators might require data treatment. To deal with this we follow a standard procedure is built as a default
We can see that most indicators have been dealt with by applying a log transformation as expected, whereas a few have been Winsorised. In total, after treatment four indicators still fall outside the skew/kurtosis limits. We will check these visually:
This shows a problem: that one of the indicators is unusually negatively skewed. In this case, applying a log transformation won’t work because that corrects for positive skew. To deal with this I have encoded a function in COINr which can deal with negative skew as well, and this is invoked here. In fact, it checks the direction of skew and applies the correct transformation.
Now let’s check the outcome. We just focus on “C.I.6” here which is the problematic indicator:
This demonstrates the effectiveness of the new transformation: it has normalised the indicator but retaining its ordering. The scale of the indicator is now different (as with all transformations) but this is not important since indicators will anyway be scaled between 0-100 in the following step, and the scaling and transformation is only for the purposes of aggregation. When presenting individual indicators, we will of course present the real data.
Normalise : Following this we can normalise the indicators using a standard per default min-max approach. This scales each indicator onto the \([0,100]\) interval.
Aggregate : Now we create aggregate levels by aggregating up to the index. Different options will be used and generated for further comparison by field experts
Our first view of the results is as a results table. The table is sorted by default from the highest scoring (most vulnerable) municipalities downwards, based on the Index scores.
These results should be checked to see whether they agree with common sense. Another way of looking at the results is in a bar chart. Here, since we have a lot of municipalities I will just plot the top thirty. They are coloured by departamento.
We can plot the same chart but broken down by Dimension scores - this can give a view of how much each dimension contributes to the total score.
As a last view of the results (for the moment), we can plot a choropleth map. This is based on the municipal shape files.
These results should be checked to see whether they agree with common sense. Another way of looking at the results is in a bar chart. Here, since we have a lot of municipalities I will just plot the top thirty. They are coloured by departamento.
We can plot the same chart but broken down by Dimension scores - this can give a view of how much each dimension contributes to the total score.
As a last view of the results (for the moment), we can plot a choropleth map. This is based on the municipal shape files.
These results should be checked to see whether they agree with common sense. Another way of looking at the results is in a bar chart. Here, since we have a lot of municipalities I will just plot the top thirty. They are coloured by departamento.
We can plot the same chart but broken down by Dimension scores - this can give a view of how much each dimension contributes to the total score.
As a last view of the results (for the moment), we can plot a choropleth map. This is based on the municipal shape files.
These results should be checked to see whether they agree with common sense. Another way of looking at the results is in a bar chart. Here, since we have a lot of municipalities I will just plot the top thirty. They are coloured by departamento.
We can plot the same chart but broken down by Dimension scores - this can give a view of how much each dimension contributes to the total score.
As a last view of the results (for the moment), we can plot a choropleth map. This is based on the municipal shape files.
Do the results options make sense to field experts?
Are there any big gaps in terms of indicators measured?
Are there need to reshuffles of indicators/categorias ?